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ABSTRACT 

A model for the evolution of biological systems in 
the absence of a nucleic acid-like genome is proposed 
and applied to model the earliest living organisms 
— protocells composed of membrane encapsulated 
peptides. Assuming that the peptides can make and 
break bonds between amino acids, and bonds in non- 
functional peptides are more likely to be destroyed 
than in functional peptides, it is demonstrated that 
the catalytic capabilities of the system as a whole can 
increase. This increase is defined to be non-genomic 
evolution. The relationship between the proposed 
mechanism for evolution and recent experiments on 
self-replicating peptides is discussed. 

1. INTRODUCTION 

The ability of organic molecules to self-organize into 
self-sustaining, reproducing and evolving structures 
governed the transformation of matter from inani- 
mate to animate on the early earth. Probably the 
earliest such structures were protocells — membrane- 
enclosed, cell-like structures capable of supporting 
essential life functions, such as the capture and uti- 
lization of energy and synthesis of proteins. [5] In 
modern organisms, most of these life functions are 
performed by proteins which are, in turn, synthesized 
on an RNA template. It is, however, unlikely that 
both proteins and RNA arose simultaneously and 
immediately became interconnected. The discovery 
of catalytic properties of RNA led to a suggetion 
that the present world of nucleic acids and proteins 
was preceded by the “RNA World,” wherein RNA 
molecules alone acted as both catalysts and informa- 
tion storage systems. [2; 1] This concept, however, 
encounters considerable difficulties. RNA is fragile 
and no efficient prebiotic syntheses of its building 
blocks have been found. Furthermore, RNA can- 
not be readily incorporated into membranes to per- 
form functions which, in modern cells, include energy 
transduction and transport. Finally, since there is no 
relationship between the function of a catalytic RNA 
and the function, if any, of the protein for which it 
can code, there is no clear path from the RNA World 
to today’s world of protein catalysis and nucleic acid 
information storage. We therefore hypothesize that 
initially protocells evolved in the absence of a nucleic 
acid-based genome and only later did coded infor- 
mation storage emerge. While peptides do not suffer 


from similar problems as RNA, amino acids cannot 
base-pair like nucleic acids, so it is not clear how 
peptides, alone, could transfer information between 
generations. Thus a new conception of “evolution” is 
necessary that does not require a nucleic acid-based, 
or similar, genome. 

Central to this new concept of non-genomic evolu- 
tion is the emergence of peptide-bond forming pro- 
toenzymes (ligases). In all likelihood, they were ini- 
tially very weak, non-specific catalysts, joining amino 
acids to form peptides of various lengths and se- 
quences. A few of the peptides so generated could 
have been better catalysts of peptide bond formation 
than the protoenzymes which formed them. These 
better protoenzymes would, in turn, generate even 
more peptides, increasing the rate at which a pro- 
tocell “searched” the space of all peptides for func- 
tional ones. Some of the peptides generated in this 
search would undoubtedly function as proteases, cut- 
ting peptide bonds. Since proteases cleave unstruc- 
tured peptides more rapidly than structured ones, 
and since functional peptides have to have some de- 
gree of ordered structure, the proteases would prefer- 
entially destroy non-functional peptides. Occasion- 
ally, the newly produced peptides would be capable 
of performing novel functions. If they integrated into 
the protocellular metabolism, they could increase its 
capabilities. This process would eventually lead to 
the emergence (or utilization) of nucleic acids and 
their coupling with peptides to yield a genomic sys- 
tem. 

For this process to be effective, it is required that 
protocells grow and divide either by acquiring am- 
phiphilic material from the environment or by pro- 
ducing it internally. The contents of the two “off- 
spring” protocells would not be identical and some 
would not contain the proper suite of components for 
self- maintenance. Nevertheless, over time, the cat- 
alytic efficiency of a community of protocells might 
increase. This increase in overall efficiency is non- 
genomic evolution . 

Recent breakthroughs in experimental protein chem- 
istry open the gates for systematic experimental and 
theoretical tests of the ideas undelying non-genomic 
evolution. Szostak and Roberts [6] have modified 
the methods of in vitro evolution, previously only 
applicable to nucleic acids, to select peptides with 
specific properties. This work will provide needed 


information on the distribution of catalytic abilities 
among small peptides. In a series of elegant papers, 
Ghadiri and co-workers [4; 3; 7] have produced a self- 
replicating peptide system with an inherent error- 
correction mechanism and have demonstrated the 
evolution of populations of peptides. Most recently, 
Chmielewski, et al. [8] have constructed another pep- 
tide system capable of auto- and cross-catalysis and 
generating self-replicating peptides that were not present 
in the original mixture. 

2. THE INHERITED EFFICIENCIES 
MODEL 

To examine the evolutionary potential of a non-genomic 
system, we have employed a simple, computationally 
tractable model which is still capable of capturing 
the essential biochemical features of the real system. 

In this model, protocellular walls are permeable to 
amino acids but not to oligopeptides of any length. 
Within the protocell, the formation and destruction 
(also called hydrolysis) of bonds between consecu- 
tive amino acids in oligopeptides (peptide bonds) 
occur through catalyzed, albeit possibly very inef- 
ficient, pathways. A peptide of any length can act 
in a double role as a substrate for polymerization 
or hydrolysis, or as a catalyst of chemical reactions. 
Since only two reactions are considered in the present 
model, all peptides are characterized by three traits: 
their length and their efficiencies as catalysts of lig- 
ation and hydrolysis of peptide bonds. These effi- 
ciencies can be interpreted as the inverse of turnover 
rates and are currently assumed to be independent 
of each other. 

In a system composed of different types of amino 
acids, peptides of the same length but different com- 
position vary in their catalytic ability. In a detailed 
model, this can be accounted for by providing micro- 
scopic rules that relate the peptide sequence to its 
catalytic efficiency. Since these rules, however, are 
not known at present, we adopt a stochastic model, 
in which the specific identities of amino acids are not 
considered. Instead, the dependence of the catalytic 
efficiency on the sequence, e, is captured by assum- 
ing that the efficiencies of peptides of length n for 
catalyzing ligation and hydrolysis reactions are dis- 
tributed with probabilities p„ {c) and p^(e), respec- 
tively. In the current implementation, these proba- 
bility distributions are Gaussian; 
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The position of the maximum of each distribution 
function increases, in a sigmoidal fashion, with the 
length of the polymer: 
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The parameter 77, (77/) sets the rate at which the 
mean efficiencies vary between their minimum value 
of ( f min) and their maximum value of €^ ax (e" ax ) 
and til (n//) is the length at which the mean effi- 
ciency is halfway between its maximum and mini- 
mum. This relationship captures the biochemically 
plausible property that initially the efficiencies in- 
crease, on average, only slightly with the length of 
the polymer. Only when peptides reach lengths suf- 
ficient for them to be able to adopt an ordered three- 
dimensional structures do the average efficiencies start 
increasing markedly. Then, for even longer polymers, 
the average efficiencies again stabilize, since gain- 
ing additional length no longer produces significant 
improvement in catalytic properties. The widths 
of the distributions cr L (n) (a H (n)) are chosen such 
that probabilities of sampling negative efficiencies 
are quite small. If such instances occur, the efficien- 
cies are reflected across the origin. 

When two peptides are joined together, the catalytic 
efficiencies of the product of this reaction are related 
to the efficiencies of the reactants. For example, 
the product of the addition of a small peptide to 
a much longer peptide has efficiencies which closely 
resemble the efficiencies of the longer “parent”. To 
underscore this relationship the model is called an 
Inherited Efficiencies Model. Statistically, catalytic 
efficiencies of the product of a ligation reaction are 
chosen from a conditional probability, k j(e \c f , e ,, )> 

which gives the probability of creating a peptide of 
length n — k + l with efficiency given peptides 
of length k and l with efficiencies c f and e", respec- 
tively. Since this probability is a property of the 
ligation process, the same form is used to assign effi- 
ciencies of ligation and hydrolysis to the product. In 
the present implementation, this probability has the 
form of a multivariate Gaussian: 
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Here, to simplify the notation, <rj = <r L {j) 
and Cj — cj (j) for the ligation (hydrolysis) proper- 
ties of the substrates or the product. 

A similar approach is taken to define a conditional 
probability for the products of hydrolysis reactions. 
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However, since hydrolytic enzymes act more efficiently 
on disordered peptides than on ordered peptides, not 
all peptide bonds are equally likely to be hydrolyzed. 
Although our model does not explicitly include the 
degree of ordering of different polymers, we exploit 
the relationship between structure and function: with- 
out a stable three-dimensional structure, high effi- 
ciency protein catalysis is impossible. In the current 
implementation of the model, the degree of structure 
of a peptide, s, is computed using: 

s — max^, e ff ]. (6) 


Clearly, other mappings between efficiency and struc- 
ture are possible. The bias of hydrolytic enzymes 
towards disordered peptides is modelled by a de- 
creasing sigmoidal function of structure, /?(s). As 
stipulated by the model, this implies that efficient 
catalysts are less likely to be hydrolyzed than inef- 
ficient, presumably disordered, peptides. The maxi- 
mum value of the bias, almost always equal to unity, 
will be denoted by Anax and the mininum value by 
Anin- The degree of structure for which the bias is 
halfway between its maximum and minimum values 
will be denoted sq and the rate of decrease of the 
bias will be controlled by a parameter denoted as r 5. 
When a peptide is hydrolyzed to form two new pep- 
tides, the catalytic efficiencies of the “offspring” are, 
once again, chosen from a conditional probability, 
n (e',c"|e), of creating peptides of lengths k and 
/, with catalytic efficiencies c f and c", respectively, 
from a peptide of length n = k + / with efficiency e. 
To find the form of this conditional probability we 
note that the making and breaking of peptide bonds 
are, in a way, inverses of each other and, therefore, 
the conditional probabilities describing the proper- 
ties of the products of ligation and hydrolysis reac- 
tions are related by Bayes’s Theorem. Considering 
that in a peptide of length n, n — 1 bonds can be hy- 
drolized and including the structural bias function 
(3($) we obtain: 
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Evaluating this expression for the specific forms of 
the probabilities, we obtain: 
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where s is the degree of structure of the “parent" 
n-mer. 

Simulations of the Inherited Efficiencies Model are 
carried out using a Monte Carlo method. Each Monte 
Carlo cycle consists of three stages: (1) the reaction 
to be performed (ligation or hydrolysis) is chosen, (2) 
the substrate or substrates are chosen from the list 
of peptides present in the system, (3) the properties 
of the product or products of the reaction are sam- 
pled from the appropriate distributions and the list 
of polymers is updated. The number of monomers in 
the protocell is held fixed to reflect the equilibrium 
between the concentrations of amino acids inside the 
protocell and in the environment, facilitated by the 
permeation properties of the protocellular boundary. 
The probabilities for the two reaction types are com- 
puted from the corresponding total catalytic capa- 
bilities of the peptides within the protocell. Once 
the reaction type is chosen, the probabilities of indi- 
vidual reactions are used to choose the substrate(s) 
of the reaction. Finally, the properties of the prod- 
ucts of the reactions were chosen from the conditional 
probabilities described above. 

3. RESULTS AND DISCUSSION 

Several properties of the inherited efficiencies model 
have been explored via Monte Carlo simulation. For 
a range of parameters, the increase in the catalytic 
capabilities of the protocell that defines non-genomic 
evolution has been observed. Here we describe the 
results of simulations aimed at assessing the role played 
by the details of the hydrolysis bias and the balance 
between the ease of creation of efficient ligases and 
proteases. 

The bias in the action of the hydrolytic enzymes 
towards the destruction of less efficient peptides is 
expected to play an important role in non-genomic 
evolution. Several simulations were performed to ex- 
plore this. In all cases, the number of monomers 
within the protocell was fixed at 1000, the maxi- 
mum efficiency means (e^ax anc ^ 4ax) were 1000.0, 
the minimum efficiency means (c£i n and c^in) were 
1.0 and the maximum of the bias (Anax) was 1.0. 
The simulations were performed for 2xl0 6 Monte 
Carlo cycles. Variations in the location of the mid- 
point of the bias ( sq ) and of the rate of decrease 
of the bias (r&) were not observed to have a quali- 
tative effect on the behavior of the model (data not 
shown). Changes to the relative depth of the bias had 
a marked effect, however. The results of three rep- 
resentative simulations, for Anin = 0.05, 0.025, and 
0.01, are shown in Figure 1. In these simulations, 
the functions governing the means of the efficiency 
distributions were adjusted to make the formation 
of ligases slightly easier than the formation of pro- 
teases (til — 20, ri = 0.235, Tif{ — 21, Tfj — 0.230). 
As the minimum value of the hydrolysis bias was 
decreased from 0.1 to 0.001, the behavior of the pro- 
tocell changed: systems with An in < 0.01 exhibited 
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Fig. 1. Results for /? m i„ = 0.05 (solid lines), 0.025 (dashed lines), 
and 0.01 (dotted lines), (a) The average ligation efficiency of the 
polymers in the protocell, (b) The average length of the poly- 
mers within the protocell, (c) The number of polymers within the 
protocell. 
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Fig. 2. Results for ni, = 21 (solid lines), 25 (dashed lines), and 
30 (dotted lines) for = 20. fa) The average ligation efficiency 
of the polymers in the protocell, (b) Average length of the poly- 
mers within the protocell, (c) The number of polymers within the 
protocell. 


a large and sustained increase in both the average 
length and average catalytic efficiencies of the pep- 
tides within the protocell. In contrast, systems with 
Amin > 0.05 showed little increase in either the av- 
erage length or the average catalytic efficiencies of 
their peptides. A single system was simulated with 
Anin = 0.025 and it exhibited very slight growth in 
the length and catalytic efficiencies of its peptides. 
Since Anin is the value of the hydrolysis bias for 
highly structured, and therefore highly efficient, pep- 
tides, its value determines the “lifespan” of highly 
efficient peptides. Large values of Amin mean that 
the probability that a highly efficient peptide will 
be hydrolyzed is not much reduced over the prob- 
ability that a peptide of average efficiency will be 
hydrolyzed. Thus, when highly efficient peptides are 
generated in a system with a large Amin, they are hy- 
drolyzed before their actions greatly affect the pop- 
ulation of peptides within the protocell and the rate 
with which the protocell explores the space of all 
peptides is not changed. In contrast, for small val- 
ues of Amin, the probability that a highly efficient 
peptide will be hydrolyzed is much smaller than the 
probability that a peptide of average efficiency will 
be hydrolyzed. Thus, highly efficient peptides are 
long-lived and their presence can increase the rate 


at which the protocell generates new peptides. The 
protocell can then evolve non-genomically. 

The rate at which novel peptides are generated within 
a protocell not only depends on the depth of the 
hydrolysis bias but is also sensitive to the balance 
between the rates of creation of small, efficient lig- 
ases and small, efficient proteases. Clearly, if highly 
efficient ligases are much more easily formed than 
efficient proteases, the protocell will fill with a di- 
verse array of long peptides. Eventually, the proto- 
cell will burst. At the other extreme, if small, ef- 
ficient proteases are much more easily formed than 
small, efficient ligases, the formation of long peptides 
will proceed slowly and any peptides formed will be 
hydrolyzed rapidly; the overall catalytic efficiency of 
the protocell will therefore remain small. A series of 
simulations were performed to examine the sensitiv- 
ity of non-genomic evolution to slight imbalances in 
the ease of creation of ligases and proteases. Par- 
ticular attention was paid to cases where proteases 
were slightly easier to produce than ligases. As be- 
fore, the number of monomers within the protocell 
was fixed at 1000, the maximum efficiency means 
(e^ ax and e^ ax ) were 1000.0, the minimum efficiency 
means (e£ in and e£ in ) were 1.0 and the maximum of 
the bias (A max ) was 1.0. The minimum of the bias 







(/?min) was set to 0 . 01 , the rate of decrease of the bias 
(rb) to 0.065 and the midpoint of the bias decrease 
(s 0 ) to 58.0. The parameters governing the means of 
the hydrolysis efficiency distributions were fixed to 
riff ~ 20 and 77 / = 0.235. The parameter governing 
the rate of change of the means of the ligation effi- 
ciency distributions, 77 ,, was fixed at 0.230 and three 
values for ul were considered: 21. 25, and 30. The 
simulations were performed for 2xlO b Monte Carlo 
cycles. 

The results of these simulat ions are displayed in Fig- 
ure 2. Clearly shown is the sensitive dependence of 
the rate of evolution on the ease of creation of Iig- 
ases: as til increases, the rate of improvement in 
the average efficiency and length of the polymers in 
the protocell decreases. No real improvement is seen 
when til — 30. These data demonstrate that the 
rates with which ligases and proteases are formed 
must be in a close balance for non-genomic evolu- 
tion to occur. 

4. SUMMARY 

The results presented here demonstrate the possibil- 
ity of a novel mechanism of early protocellular evo- 
lution. This mechanism does not require the pres- 
ence of a genome, nor does it rely on any form of 
sequence complementarity or the exact replication 
of protein sequences. In fact, the sloppy replication 
of protein sequences is an advantage in the earliest 
phase of evolution because it allows for the rapid 
exploration of the space of proteins and the discov- 
ery of new functions. It is the preservation of these 
functions and their interrelationships which must be 
maintained during this early stage of evolution, not 
the identity of the actors performing those functions. 
Further, evolution progresses through improvements 
of the whole community rather than the most fit in- 
dividuals. 

The proposed model makes truly minimal assump- 
tions — the existence of polymers capable of per- 
forming constructive and destructive processes and 
some preference for the destruction of non-functional 
polymers. This preference, well-motivated by the 
known biochemistry of protein enzymes, drives the 
evolution of protocells. 

Although specific interactions between peptides are 
not included here, they can be readily incorporated 
into the proposed concept of evolution. In fact, there 
is no conflict between this concept and the work of 
Ghadiri and Chmielewski. Since non-genomic evolu- 
tion is necessarily limited by its inability to transfer 
information sufficiently precisely, specificity of pep- 
tide interactions would improve the fidelity of infor- 
mation transfer, hence increasing evolutionary po- 
tential of the system. Ultimately, however, a truly 
advanced protocell would have to find a better method 
of transferring information to its offsprings. 

The model can be naturally extended to include the 
possibility of producing peptides capable of perform- 


ing new protocellular functions and to describe growth 
and division of protocells. Perhaps more importantly, 
recent advancements that allow the in vitro evolu- 
tion of catalytic peptides [ 6 ] provide firm ground for 
improving the model and testing its predictions ex- 
perimentally. 
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